Update to transformers v5 by hmellor · Pull Request #30566 · vllm-project/vllm

hmellor · 2025-12-12T17:44:48Z

Changes:

Update Transformers pin to 5.5.3
Update Tokenizers pin to 0.22.2 (as is required by Transformers 5.0.0)
Update PEFT lower bound to 0.18.1 so that huggingface/peft@41c07f0 is included (guards import of HybridCache on Transformers version)
Update Accelerate pin to 1.13.0 so that 4-bit bnb can work on Transformers v5
Update Mamba pin to 2.3.0 so that state-spaces/mamba@35e927b is included (removes import that was deleted in Transformers v5)
Update compressed-tensors to 0.15.0 as this is the earliest version that supports Transformers v5
Add HF_HUB_DOWNLOAD_TIMEOUT=60 to the CI environment to deal with the shortened timeout in huggingface-hub>=1 since it switched to httpx
Adds a backward compatbility tests that runs the same tests as "Transformers nightly", but with 4.57.5 installed

Some architectures/tests need to be skipped in order to get this upgrade through. We need this upgrade as it is blocking proper support of SoTA architectures released after Transformers v5. This is not a commitment to drop these architectures forever, simply a temporary measure. We plan to restore these architectures/tests following the upgrade.

Architectures/models that will no longer work after the upgrade:

Plamo2ForCausalLM - Custom model code uses _tied_weight_keys: list[str] but Transformers v5 now expects _tied_weight_keys: dict[str, str]
OpenCUAForConditionalGeneration - Custom code is not compatible with Transformers v5
OpenPanguVLForConditionalGeneration - OpenPanguVLVideoProcessorInitKwargs does not specify total=False, making all kwargs required
Alibaba-NLP/gte-Qwen2-1.5B-instruct - numerical issues with this model
PaddlePaddle/PaddleOCR-VL - imports deleted object
Custom tokenizer not compatible with Transformers v5:
- InternS1ForConditionalGeneration
- BAAI/bge-code-v1
- XverseForCausalLM
Custom processor not compatible with Transformers v5:
- Ovis2_5
- Ovis2_6_MoeForCausalLM
- MiniCPMO
- MiniCPMV
- Phi4ForCausalLMV
Custom config not compatible with Transformers v5:
- InternLM2VEForCausalLM
- HCXVisionForCausalLM
- Tarsier2ForConditionalGeneration
- SarvamMLAForCausalLM

Tests that are disabled after upgrade:

VLM tests for intern_vl, isaac, ultravox because these models are broken in Transformers v5 and therefore the HF reference cannot be generated
The following checkpoints because HF reference cannot be generated:
- jinaai/jina-embeddings-v3
- OpenGVLab/InternViT-*
- InternVisionModel
- jinaai/jina-reranker-m0
- jinaai/jina-embeddings-v3
- nvidia/NVIDIA-Nemotron-Parse-v1.1
- ColQwen3

Supplementary PRs:

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

gemini-code-assist

Code Review

This pull request aims to update the transformers library to version 5. The changes correctly update the version in requirements/test.in and requirements/nightly_torch_test.txt, and also add the --pre flag to uv pip install in the Dockerfile to allow installation of the release candidate. However, there is a critical oversight: requirements/common.txt still contains a constraint transformers < 5. This will lead to build failures for any configuration that relies on common.txt. This file must be updated to allow transformers v5 for this PR to be mergeable.

chatgpt-codex-connector · 2025-12-12T17:56:38Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

Comment @cursor review or bugbot run to trigger another review on this PR

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

mergify · 2026-01-28T00:30:25Z

Documentation preview: https://vllm--30566.org.readthedocs.build/en/30566/

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

Signed-off-by: khluu <khluu000@gmail.com>

This reverts commit 40742ca.

Disable fused ops (VLLM_CPU_CI_ENV=0) for the untrained tiny-mixtral model on CPU to reduce bfloat16 rounding that causes logprob divergence. Also pass VLLM_CPU_ATTN_SPLIT_KV=0 to the CPU CI docker container. Co-authored-by: jiang1.li <jiang1.li@intel.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: khluu <khluu000@gmail.com>

khluu · 2026-04-15T01:27:32Z

I believe we took care of all the CI failures with transformers v5 upgrade. Thanks @bigPYJ1151 for the CPU fix!

Running full CI again now: https://buildkite.com/vllm/ci/builds/61345 hopefully it's the last

Signed-off-by: khluu <khluu000@gmail.com>

XVERSE tokenizer is incompatible with transformers v5 due to an add_prefix_space / prepend_scheme mismatch in tokenizer.json that causes loading to fail. Cap at transformers<=4.57 until upstream fixes. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: khluu <khluu000@gmail.com>

khluu · 2026-04-15T05:12:39Z

Claude's approach for basics model test (extra init 2)

▎ Skip XverseForCausalLM tests on transformers v5
▎
▎ The XVERSE tokenizer (xverse/XVERSE-7B-Chat) is incompatible with transformers v5 —
▎ AutoTokenizer.from_pretrained fails with add_prefix_space does not match declared prepend_scheme
▎ due to a mismatch in the model's tokenizer.json. This is an upstream issue in the XVERSE tokenizer
▎ files, not in vLLM or transformers.
▎
▎ Added max_transformers_version="4.57" with transformers_version_reason={"vllm": ...} so both
▎ test_registry_imports and test_can_initialize_large_subset skip this model on transformers v5.

Signed-off-by: khluu <khluu000@gmail.com>

Move _get_lora_aux_cuda_stream, lora_linear_async, and the custom op registration out of the `if envs.VLLM_LORA_ENABLE_DUAL_STREAM:` block. The block was evaluated at import time, but test fixtures set the env var via monkeypatch after import, causing NameError / AttributeError when the runtime code tried to call these functions. They are only invoked when `_enable_aux_cuda_stream` is True (checked at runtime), so defining them unconditionally is safe. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: khluu <khluu000@gmail.com>

khluu · 2026-04-15T06:10:20Z

Claude's fix for this test: https://buildkite.com/vllm/ci/builds/61345#019d8f38-a3e5-47f5-94aa-031e3b466e29/L3122

test_olmoe_lora — NameError: _get_lora_aux_cuda_stream is not defined (cb03f5d)

The _get_lora_aux_cuda_stream function, lora_linear_async, and the direct_register_custom_op call
were all inside an if envs.VLLM_LORA_ENABLE_DUAL_STREAM: block evaluated at import time. The test
fixture sets the env var via monkeypatch.setenv after import, so the names were never defined when
the runtime code tried to use them. Moved all definitions outside the conditional — they're only
invoked when _enable_aux_cuda_stream is True (checked at runtime in _init_lora_stream_context and
apply), so registering them unconditionally is safe.

khluu · 2026-04-15T06:10:49Z

Claude's fix for step3 tool parser

test_step3_tool_parser — spaces stripped in streaming tests (e187e72)

Transformers v5's LlamaTokenizerFast.init unconditionally replaces the pre-tokenizer from
tokenizer.json with Metaspace. For models like stepfun-ai/step3 whose tokenizer uses ByteLevel, this
causes spaces to be silently dropped. Added _restore_original_pretokenizer() in vllm/tokenizers/hf.py
that detects the mismatch and restores the original pre-tokenizer/decoder from tokenizer.json.

test_minimax_tool_parser — trust_remote_code error (e187e72)

Transformers v5 now calls AutoConfig.from_pretrained internally during tokenizer loading. For
custom-code models like MiniMax, this requires trust_remote_code=True. Added the missing parameter to
the test fixture.

Wrap the get_config() call in get_tokenizer() with contextlib.suppress so it gracefully handles paths that don't contain a config.json (e.g. LoRA adapter directories passed as tokenizer paths). The config pre-registration is only needed for custom vllm configs and is irrelevant for adapter or tokenizer-only paths. Fixes test_quant_model_lora failure. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: khluu <khluu000@gmail.com>

khluu · 2026-04-15T06:14:10Z

Claude's fix for https://buildkite.com/vllm/ci/builds/61345#019d8f38-789f-401f-b021-183d4141b2f8/L2600

Fix test_quant_model_lora crash — LoRA adapter path passed to get_config() (cc19a1b)

The get_config() call added in get_tokenizer() (commit 8f551d0) pre-registers custom vllm configs
with AutoConfig before tokenizer loading. However, tokenizer_name can be a LoRA adapter directory
(e.g. jashing/tinyllama-colorist-lora), which doesn't contain a config.json — only
adapter_config.json. This causes get_config() to raise ValueError: Invalid repository ID or local
directory specified.

Wrapped the call with contextlib.suppress(ValueError, OSError) since the config pre-registration is
only relevant for models with custom vllm configs, not for LoRA adapter or tokenizer-only paths.

This reverts commit e187e72. Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

hmellor · 2026-04-15T09:10:32Z

For step3_vl, the issue is that tokenizer_config.json explicitly sets the wrong tokenizer_class (it is not a LlamaTokenizerFast).

I've pushed 816db8b which uses TokenizersBackend instead. This class will determing what the tokenizer is based on the actual tokenizer.json and is significantly more reliable. We can use this mechanism for other checkpoints we find with the same issue.

The longer term solution is to upstream this override to Transformers, which I have done in huggingface/transformers#45449.

This reverts commit cb03f5d. Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

hmellor · 2026-04-15T10:06:08Z

I've moved the fix for VLLM_LORA_ENABLE_DUAL_STREAM to the test code so that the behaviour in vLLM is unchanged

These models fail with `AttributeError: 'dict' object has no attribute '__name__'` on transformers v5.2+. Add max_transformers_version="5.1" until upstream compatibility is fixed. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: khluu <khluu000@gmail.com>

khluu · 2026-04-15T20:42:06Z

full CI run: https://buildkite.com/vllm/ci/builds/61509
hopefully we can close this at exactly 200 commits xD

The processing test uses check_version_reason="vllm", so the skip reason must be "vllm" not "hf" to actually take effect. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: khluu <khluu000@gmail.com>

update to transformers v5

990f522

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

hmellor added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 12, 2025

mergify bot added the ci/build label Dec 12, 2025

gemini-code-assist bot reviewed Dec 12, 2025

View reviewed changes

Comment thread requirements/nightly_torch_test.txt Outdated

Comment thread requirements/test.in Outdated

hmellor marked this pull request as ready for review December 12, 2025 17:56

hmellor changed the title ~~update to transformers v5~~ Update to transformers v5 Dec 15, 2025

hmellor linked an issue Dec 17, 2025 that may be closed by this pull request

[Feature]: Support transformers>=5 #30466

Closed

1 task

hmellor added 2 commits December 17, 2025 15:11

Merge branch 'main' into transformers-v5

42bc6a1

Merge branch 'main' into transformers-v5

dd261ff

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

huydhn mentioned this pull request Jan 13, 2026

Upgrade transformers-4.57.5 #32287

Merged

Potabk mentioned this pull request Jan 22, 2026

[CI] Remove tranformers version restriction vllm-project/vllm-ascend#6122

Closed

Merge branch 'main' into transformers-v5

048a32c

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

cursor bot reviewed Jan 27, 2026

View reviewed changes

Comment thread requirements/nightly_torch_test.txt Outdated

Allow Transformer v5 in common.txt

933bef9

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

hmellor linked an issue Jan 27, 2026 that may be closed by this pull request

Bump transformers to 5.0.0 #33132

Closed

hmellor added 3 commits January 27, 2026 18:30

Merge branch 'main' into transformers-v5

12f6195

Update PEFT pin to avoid bad import

769d436

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

Update lm-eval

214c373

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

hmellor requested a review from tjtanaa as a code owner January 27, 2026 23:32

mergify bot added the rocm Related to AMD ROCm label Jan 27, 2026

HF_HUB_ENABLE_HF_TRANSFER -> HF_XET_HIGH_PERFORMANCE

ec4ffa9

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

hmellor requested review from 22quinn, gshtras and jikunshang as code owners January 28, 2026 00:29

mergify bot added the documentation Improvements or additions to documentation label Jan 28, 2026

ji-huazhong mentioned this pull request Jan 28, 2026

[Feature] Update Transformers to v5 verl-project/verl#5080

Open

Skip custom model which uses old imports

94e1429

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

hmellor requested a review from DarkLight1337 as a code owner January 28, 2026 10:53

hmellor and others added 6 commits April 14, 2026 10:48

alternative fix for vllm config in get_tokenizer

8f551d0

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

revert step3p5 test changes now that get_tokenizer is fixed

e67530c

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

Bump huggingface-hub and remove delete workaround

87f3a14

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

temp fix for tinymixtral test

40742ca

Signed-off-by: khluu <khluu000@gmail.com>

Revert "temp fix for tinymixtral test"

ea58ae3

This reverts commit 40742ca.

khluu added 2 commits April 15, 2026 05:05

add back firered and funasr model back to registry

f50bb9d

Signed-off-by: khluu <khluu000@gmail.com>

khluu added 2 commits April 15, 2026 06:05

claude fix pretokenizer for step3p5 and tool parser

e187e72

Signed-off-by: khluu <khluu000@gmail.com>

hmellor added 2 commits April 15, 2026 08:27

Revert "claude fix pretokenizer for step3p5 and tool parser"

d894c4b

This reverts commit e187e72. Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

better fix for bad tokenizer_class config

816db8b

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

hmellor added 2 commits April 15, 2026 09:59

Revert "fix LoRA dual-stream defs guarded by import-time env check"

410ae69

This reverts commit cb03f5d. Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

test side fix for lora dual stream

962976d

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

hmellor and others added 3 commits April 15, 2026 11:06

Merge branch 'main' into transformers-v5

2cba808

Merge branch 'main' into transformers-v5

79e9772

jikunshang mentioned this pull request Apr 16, 2026

[Bug]: CVE-2026-1839 requires transformers update #39746

Open

1 task

This was referenced Apr 16, 2026

Remove pinned transformers==5.3.0 kyuz0/amd-strix-halo-vllm-toolboxes#38

Open

Convert to multi-stage build to reduce runtime image size kyuz0/amd-strix-halo-vllm-toolboxes#39

Closed

Uh oh!

Conversation

hmellor commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot commented Dec 12, 2025

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mergify bot commented Jan 28, 2026

Uh oh!

khluu commented Apr 15, 2026

Uh oh!

khluu commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

khluu commented Apr 15, 2026

Uh oh!

khluu commented Apr 15, 2026

Uh oh!

khluu commented Apr 15, 2026

Uh oh!

hmellor commented Apr 15, 2026

Uh oh!

hmellor commented Apr 15, 2026

Uh oh!

khluu commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

hmellor commented Dec 12, 2025 •

edited

Loading

khluu commented Apr 15, 2026 •

edited

Loading